{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reducing storage usage for vectors on Azure AI Search\n",
    "\n",
    "This code demonstrates how to use the following features to reduce vector storage on Azure AI Search.\n",
    "\n",
    "+ Use smaller \"narrow\" data types instead of `Edm.Single`. Types such as `Edm.Half` reduce storage overhead.\n",
    "+ Disable storing vectors used in the query response. Vectors returned in a query response are stored separately from the vectors used during queries.\n",
    "+ Quantizing vectors. Use built-in scalar or binary quantization to quantize embeddings to `Edm.Int8` without any reduction in query performance. Information loss from quantization can be compensated for using the original unquantized embeddings and oversampling.\n",
    "+ Truncating dimensions. Use built-in truncation dimension option to reduce vector dimensionality with minimal reduction in query performance.\n",
    "\n",
    "### Prerequisites\n",
    "\n",
    "+ An Azure subscription.\n",
    " \n",
    "+ Azure AI Search, any tier, but we recommend Basic or higher for this workload. [Enable semantic ranker](https://learn.microsoft.com/azure/search/semantic-how-to-enable-disable) if you want to run a hybrid query with semantic ranking."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up a Python virtual environment in Visual Studio Code\n",
    "\n",
    "1. Open the Command Palette (Ctrl+Shift+P).\n",
    "1. Search for **Python: Create Environment**.\n",
    "1. Select **Venv**.\n",
    "1. Select a Python interpreter. Choose 3.10 or later.\n",
    "\n",
    "It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install -r vector-quantization-and-storage-requirements.txt --quiet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load .env file (Copy .env-sample to .env and update accordingly)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dotenv import load_dotenv\n",
    "from azure.identity import DefaultAzureCredential\n",
    "from azure.core.credentials import AzureKeyCredential\n",
    "import os\n",
    "\n",
    "load_dotenv(override=True) # take environment variables from .env.\n",
    "\n",
    "# Variables not used here do not need to be updated in your .env file\n",
    "endpoint = os.environ[\"AZURE_SEARCH_SERVICE_ENDPOINT\"]\n",
    "credential = AzureKeyCredential(os.getenv(\"AZURE_SEARCH_ADMIN_KEY\", \"\")) if len(os.getenv(\"AZURE_SEARCH_ADMIN_KEY\", \"\")) > 0 else DefaultAzureCredential()\n",
    "base_index_name = os.getenv(\"AZURE_SEARCH_INDEX\", \"teststorage\")\n",
    "embedding_dimensions = int(os.getenv(\"AZURE_OPENAI_EMBEDDING_DIMENSIONS\", 3072))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load embeddings\n",
    "\n",
    "Load the embeddings from a precomputed file. These embeddings use [text-embedding-3-large](https://learn.microsoft.com/azure/ai-services/openai/concepts/models#embeddings) with 3072 dimensions. The chunks are from the sample data in the document folder, chunked using the [Split Skill](https://learn.microsoft.com/azure/search/cognitive-search-skill-textsplit)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from lib.embeddings import content_path\n",
    "\n",
    "with open(content_path, \"r\") as f:\n",
    "    chunks = json.load(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create indexes\n",
    "\n",
    "To demonstrate the storage impact of the different options, the following code creates indexes that use each option, and another index that combines all the options together"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Function to define the indexes on the search service\n",
    "from typing import List\n",
    "from azure.search.documents.indexes import SearchIndexClient\n",
    "from azure.search.documents.indexes.models import (\n",
    "    SimpleField,\n",
    "    SearchFieldDataType,\n",
    "    VectorSearch,\n",
    "    HnswAlgorithmConfiguration,\n",
    "    VectorSearchProfile,\n",
    "    SemanticConfiguration,\n",
    "    SemanticPrioritizedFields,\n",
    "    SemanticField,\n",
    "    SemanticSearch,\n",
    "    SearchIndex,\n",
    "    SearchField,\n",
    "    ScalarQuantizationCompression,\n",
    "    BinaryQuantizationCompression,\n",
    "    VectorSearchCompression\n",
    ")\n",
    "\n",
    "def create_index(index_name, dimensions, use_scalar_compression=False, use_binary_compression=False, use_float16=False, use_stored=True, truncation_dimension=None):\n",
    "    if use_float16:\n",
    "        vector_type = \"Collection(Edm.Half)\"\n",
    "    else:\n",
    "        vector_type = \"Collection(Edm.Single)\"\n",
    "\n",
    "    # Vector fields that aren't stored can never be returned in the response\n",
    "    fields = [\n",
    "        SimpleField(name=\"id\", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True),\n",
    "        SearchField(name=\"title\", type=SearchFieldDataType.String),\n",
    "        SearchField(name=\"chunk\", type=SearchFieldDataType.String),\n",
    "        SearchField(name=\"embedding\", type=vector_type, searchable=True, stored=use_stored, vector_search_dimensions=dimensions, vector_search_profile_name=\"myHnswProfile\")\n",
    "    ]\n",
    "\n",
    "    compression_configurations: List[VectorSearchCompression] = []\n",
    "    if use_scalar_compression:\n",
    "        compression_name = \"myCompression\"\n",
    "        compression_configurations = [\n",
    "            ScalarQuantizationCompression(compression_name=compression_name, truncation_dimension=truncation_dimension)\n",
    "        ]\n",
    "    elif use_binary_compression:\n",
    "        compression_name = \"myCompression\"\n",
    "        compression_configurations = [\n",
    "            BinaryQuantizationCompression(compression_name=compression_name, truncation_dimension=truncation_dimension)\n",
    "        ]\n",
    "    else:\n",
    "        compression_name = None\n",
    "        compression_configurations = []\n",
    "    \n",
    "    vector_search = VectorSearch(\n",
    "        algorithms=[\n",
    "            HnswAlgorithmConfiguration(name=\"myHnsw\")\n",
    "        ],\n",
    "        profiles=[\n",
    "            VectorSearchProfile(name=\"myHnswProfile\", algorithm_configuration_name=\"myHnsw\", compression_name=compression_name)\n",
    "        ],\n",
    "        compressions=compression_configurations\n",
    "    )\n",
    "\n",
    "    semantic_config = SemanticConfiguration(\n",
    "        name=\"my-semantic-config\",\n",
    "        prioritized_fields=SemanticPrioritizedFields(\n",
    "            title_field=SemanticField(field_name=\"title\"),\n",
    "            content_fields=[SemanticField(field_name=\"chunk\")]\n",
    "        )\n",
    "    )\n",
    "    semantic_search = SemanticSearch(configurations=[semantic_config])\n",
    "\n",
    "    return SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Created indexes\n"
     ]
    }
   ],
   "source": [
    "# Create indexes to compare storage usage\n",
    "# The baseline index does not use any options\n",
    "\n",
    "indexes = {\n",
    "    \"baseline\": {},\n",
    "    \"scalar-compression\": {\n",
    "        \"use_scalar_compression\": True\n",
    "    },\n",
    "    \"binary-compression\": {\n",
    "        \"use_binary_compression\": True\n",
    "    },\n",
    "    \"narrow\": {\n",
    "        \"use_float16\": True\n",
    "    },\n",
    "    \"no-stored\": {\n",
    "        \"use_stored\": False\n",
    "    },\n",
    "    \"scalar-compresssion-truncation-dimension\": {\n",
    "        \"use_scalar_compression\": True,\n",
    "        \"truncation_dimension\": 1024\n",
    "    },\n",
    "    \"binary-compression-truncation-dimension\": {\n",
    "        \"use_binary_compression\": True,\n",
    "        \"truncation_dimension\": 1024\n",
    "    },\n",
    "    \"all-options-with-scalar\": {\n",
    "        \"use_scalar_compression\": True,\n",
    "        \"use_float16\": True,\n",
    "        \"use_stored\": False,\n",
    "        \"truncation_dimension\": 1024\n",
    "    },\n",
    "    \"all-options-with-binary\": {\n",
    "        \"use_binary_compression\": True,\n",
    "        \"use_float16\": True,\n",
    "        \"use_stored\": False,\n",
    "        \"truncation_dimension\": 1024\n",
    "    }\n",
    "}\n",
    "\n",
    "search_index_client = SearchIndexClient(endpoint, credential)\n",
    "for index, options in indexes.items():\n",
    "    index = create_index(f\"{base_index_name}-{index}\", dimensions=embedding_dimensions, **options)\n",
    "    search_index_client.create_or_update_index(index)\n",
    "\n",
    "print(\"Created indexes\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Function to upload the embeddings to each index\n",
    "\n",
    "import json\n",
    "from lib.embeddings import content_path\n",
    "from azure.search.documents import SearchIndexingBufferedSender\n",
    "\n",
    "def upload_embeddings(index_name):\n",
    "    with open(content_path, \"r\") as f:\n",
    "        content = json.load(f)\n",
    "    \n",
    "    with SearchIndexingBufferedSender(endpoint, index_name, credential) as client:\n",
    "        client.upload_documents(content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "for index in indexes.keys():\n",
    "    upload_embeddings(f\"{base_index_name}-{index}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Check storage sizes\n",
    "\n",
    "Find the new storage size in MB to demonstrate how the various options affect storage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "****************************************\n",
      "Index Name: my-demo-index-scalar-compression\n",
      "Storage Size: 19.3605MB\n",
      "Vector Size: 1.2242MB\n",
      "****************************************\n",
      "Index Name: my-demo-index-scalar-compresssion-truncation-dimension\n",
      "Storage Size: 18.5597MB\n",
      "Vector Size: 0.4234MB\n",
      "****************************************\n",
      "Index Name: my-demo-index-binary-compression\n",
      "Storage Size: 18.3085MB\n",
      "Vector Size: 0.1732MB\n",
      "****************************************\n",
      "Index Name: my-demo-index-binary-compression-truncation-dimension\n",
      "Storage Size: 18.2084MB\n",
      "Vector Size: 0.0731MB\n",
      "****************************************\n",
      "Index Name: my-demo-index-baseline\n",
      "Storage Size: 18.1559MB\n",
      "Vector Size: 4.8277MB\n",
      "****************************************\n",
      "Index Name: my-demo-index-narrow\n",
      "Storage Size: 15.7536MB\n",
      "Vector Size: 2.4254MB\n",
      "****************************************\n",
      "Index Name: my-demo-index-no-stored\n",
      "Storage Size: 7.7143MB\n",
      "Vector Size: 4.8277MB\n",
      "****************************************\n",
      "Index Name: my-demo-index-all-options-with-scalar\n",
      "Storage Size: 5.7287MB\n",
      "Vector Size: 0.4245MB\n",
      "****************************************\n",
      "Index Name: my-demo-index-all-options-with-binary\n",
      "Storage Size: 0.0019MB\n",
      "Vector Size: 0.0369MB\n"
     ]
    }
   ],
   "source": [
    "# Please note - there may be delays in finding index statistics after document upload\n",
    "# Index statistics is not a real time API\n",
    "# See https://learn.microsoft.com/rest/api/searchservice/preview-api/get-index-statistics for more information\n",
    "\n",
    "def bytes_to_mb(bytes):\n",
    "    return round(bytes / (1024 * 1024), 4)\n",
    "\n",
    "def find_storage_size_mb(index_name):\n",
    "    response = search_index_client.get_index_statistics(index_name)\n",
    "    return bytes_to_mb(response[\"storage_size\"]), bytes_to_mb(response[\"vector_index_size\"])\n",
    "\n",
    "index_sizes = [(find_storage_size_mb(index_name), index_name) for index_name in (f\"{base_index_name}-{index}\" for index in indexes.keys())]\n",
    "index_sizes.sort(key=lambda item: item[0][0], reverse=True)\n",
    "\n",
    "for ((storage_size, vector_size), index_name) in index_sizes:\n",
    "    print(\"*\" * 40)\n",
    "    print(f\"Index Name: {index_name}\\nStorage Size: {storage_size}MB\\nVector Size: {vector_size}MB\")\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
